09. Quiz: Expected Sarsa
Quiz: Expected Sarsa
Say that an agent is learning to navigate the gridworld described earlier in the lesson.

Gridworld Example
Suppose the agent is using Expected Sarsa in its search for the optimal policy, with \alpha=0.1 .
At the end of the 99th episode, the Q-table has the following values:

Q-table
Say that at the beginning of the 100th episode, the agent starts in state 1 and selects action right . As a result, it receives reward -1 , and the next state is state 2 .

Beginning of the 100th episode
In the previous video, you learned that at this point in time, the agent updates the Q-table.